{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SQL TO KQL Conversion (Experimental)\n",
    "\n",
    "The `sql_to_kql` module is a simple converter to KQL based on [mo-sql-parsing](https://github.com/klahnakoski/mo-sql-parsing).\n",
    "It is an experimental feature built to help us convert a few queries but we\n",
    "thought that it was useful enough to include in MSTICPy.\n",
    "\n",
    "You must have msticpy installed along with the mo-sql-parser package to run this notebook:\n",
    "```\n",
    "%pip install --upgrade msticpy[sql2kql]\n",
    "```\n",
    "\n",
    "It supports a subset of ANSI SQL-92 which includes the following:\n",
    "- SELECT (including column renaming and functions)\n",
    "- FROM (including from subquery)\n",
    "- WHERE (common string and int operations, LIKE, some common functions)\n",
    "- LIMIT\n",
    "- UNION, UNION ALL\n",
    "- JOIN - only tested for relatively simple join expressions\n",
    "- GROUP BY\n",
    "- SQL Comments (ignored)\n",
    "\n",
    "It does not support HAVING, multiple SQL statements or anything complex like Common Table Expressions.\n",
    "\n",
    "It does support a few additional Spark SQL extensions like RLIKE.\n",
    "\n",
    "## Caveat Emptor!\n",
    "This module is included in MSTICPy in the hope that it might be useful to others.\n",
    "We do not intend to expand its capabilities.\n",
    "\n",
    "It is also not guaranteed to produce perfectly-executing KQL - there will likely\n",
    "be things that you have to fix up in the output query. \n",
    "You will, for example, nearly always need change\n",
    "the names of the fields used since the source data tables are unlikely\n",
    "to exactly match the schema of your Kusto/Azure Sentinel target.\n",
    "\n",
    "The module does include an elementary table name mapping function that we\n",
    "demonstrate below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "import os\n",
    "import sys\n",
    "import warnings\n",
    "from IPython.display import display, HTML, Markdown\n",
    "            \n",
    "import msticpy\n",
    "msticpy.init_notebook(namespace=globals(), verbosity=0)\n",
    "\n",
    "from msticpy.data.sql_to_kql import sql_to_kql"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simple SQL Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "apt29Host\n",
      "| where Channel == 'Microsoft-Windows-Sysmon/Operational'\n",
      "  and EventID between (1 .. 10)\n",
      "  and tolower(ParentImage) endswith 'explorer.exe'\n",
      "  and EventID in ('4', '5', '6')\n",
      "  and tolower(Image) startswith '3aka3'\n",
      "| project Message, Otherfield\n",
      "| distinct Message, Otherfield\n",
      "| limit 10\n"
     ]
    }
   ],
   "source": [
    "sql = \"\"\"\n",
    "SELECT DISTINCT Message, Otherfield\n",
    "FROM apt29Host\n",
    "WHERE Channel = \"Microsoft-Windows-Sysmon/Operational\"\n",
    "    AND EventID BETWEEN 1 AND 10\n",
    "    AND LOWER(ParentImage) LIKE '%explorer.exe'\n",
    "    AND EventID IN ('4', '5', '6')\n",
    "    AND LOWER(Image) LIKE \"3aka3%\"\n",
    "LIMIT 10\n",
    "\"\"\"\n",
    "\n",
    "kql = sql_to_kql(sql)\n",
    "print(kql)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SQL Joins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "apt29Host\n",
      "| project EventID, ParentImage, Image, Message, Otherfield\n",
      "| join kind=inner (MyTable\n",
      "  | project Message, evt_id) on $right.Message == $left.Message\n",
      "  and $right.evt_id == $left.EventID\n",
      "| where Channel == 'Microsoft-Windows-Sysmon/Operational'\n",
      "  and EventID == 1\n",
      "  and tolower(ParentImage) endswith 'explorer.exe'\n",
      "  and tolower(Image) startswith '.*3aka3'\n",
      "| summarize any(Message), any(Otherfield), dcount(EventID) by EventID\n",
      "| order by Message desc, Otherfield\n",
      "| limit 10\n"
     ]
    }
   ],
   "source": [
    "sql=\"\"\"\n",
    "SELECT DISTINCT Message, Otherfield, COUNT(DISTINCT EventID)\n",
    "FROM (SELECT EventID, ParentImage, Image, Message, Otherfield FROM apt29Host) as A\n",
    "--FROM A\n",
    "INNER JOIN (Select Message, evt_id FROM MyTable ) on MyTable.Message == A.Message and MyTable.evt_id == A.EventID\n",
    "WHERE Channel = \"Microsoft-Windows-Sysmon/Operational\"\n",
    "    AND EventID = 1\n",
    "    AND LOWER(ParentImage) LIKE \"%explorer.exe\"\n",
    "    AND LOWER(Image) RLIKE \".*3aka3%\"\n",
    "GROUP BY EventID\n",
    "ORDER BY Message DESC, Otherfield\n",
    "LIMIT 10\n",
    "\"\"\"\n",
    "\n",
    "kql = sql_to_kql(sql)\n",
    "print(kql)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table Renaming"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SecurityEvent\n",
      "| project EventID, ParentImage, Image, Message, Otherfield\n",
      "| join kind=inner (SigninLogs\n",
      "  | project Message, evt_id) on $right.Message == $left.Message\n",
      "  and $right.evt_id == $left.EventID\n",
      "| where Channel == 'Microsoft-Windows-Sysmon/Operational'\n",
      "  and EventID == 1\n",
      "  and tolower(ParentImage) endswith 'explorer.exe'\n",
      "  and tolower(Image) startswith '.*3aka3'\n",
      "| summarize any(Message), any(Otherfield), dcount(EventID) by EventID\n",
      "| order by Message desc, Otherfield\n",
      "| limit 10\n"
     ]
    }
   ],
   "source": [
    "sql=\"\"\"\n",
    "SELECT DISTINCT Message, Otherfield, COUNT(DISTINCT EventID)\n",
    "FROM (SELECT EventID, ParentImage, Image, Message, Otherfield FROM apt29Host) as A\n",
    "INNER JOIN (Select Message, evt_id FROM MyTable ) on MyTable.Message == A.Message and MyTable.evt_id == A.EventID\n",
    "WHERE Channel = \"Microsoft-Windows-Sysmon/Operational\"\n",
    "    AND EventID = 1\n",
    "    AND LOWER(ParentImage) LIKE \"%explorer.exe\"\n",
    "    AND LOWER(Image) RLIKE \".*3aka3%\"\n",
    "GROUP BY EventID\n",
    "ORDER BY Message DESC, Otherfield\n",
    "LIMIT 10\n",
    "\"\"\"\n",
    "\n",
    "table_map = {\"apt29Host\": \"SecurityEvent\", \"MyTable\": \"SigninLogs\"}\n",
    "\n",
    "kql = sql_to_kql(sql, table_map)\n",
    "print(kql)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Join with Aliases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SecurityEvent\n",
      "| join kind=inner (SecurityEvent\n",
      "  | where Channel == 'Microsoft-Windows-Sysmon/Operational'\n",
      "  and EventID == 1\n",
      "  and tolower(ParentImage) matches regex '.*partial.string.*'\n",
      "  and tolower(Image) endswith 'cmd.exe'\n",
      "  | project ProcessGuid) on $left.ParentProcessGuid == $right.ProcessGuid\n",
      "| where Channel == 'Microsoft-Windows-Sysmon/Operational'\n",
      "  and EventID == 1\n",
      "  and tolower(Image) endswith 'powershell.exe'\n",
      "| project Message\n"
     ]
    }
   ],
   "source": [
    "sql=\"\"\"\n",
    "SELECT Message\n",
    "FROM apt29Host a\n",
    "INNER JOIN (\n",
    "    SELECT ProcessGuid\n",
    "    FROM apt29Host\n",
    "    WHERE Channel = \"Microsoft-Windows-Sysmon/Operational\"\n",
    "        AND EventID = 1\n",
    "        AND LOWER(ParentImage) RLIKE '.*partial_string.*'\n",
    "        AND LOWER(Image) LIKE '%cmd.exe'\n",
    ") b\n",
    "ON a.ParentProcessGuid = b.ProcessGuid\n",
    "WHERE Channel = \"Microsoft-Windows-Sysmon/Operational\"\n",
    "    AND EventID = 1\n",
    "    AND LOWER(Image) LIKE '%powershell.exe'\n",
    "\"\"\"\n",
    "\n",
    "kql = sql_to_kql(sql, table_map)\n",
    "print(kql)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Unions and Group By"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SecurityEvent\n",
      "| project EventID, ParentImage, Image, Message, Otherfield\n",
      "| union (SecurityEvent\n",
      "  | project EventID, ParentImage, Image, Message, Otherfield\n",
      "  | join kind=inner (SigninLogs) on $right.mssg == $left.Message\n",
      "  | where Channel == 'Microsoft-Windows-Sysmon/Operational'\n",
      "  and EventID == 1\n",
      "  and tolower(ParentImage) endswith 'explorer.exe'\n",
      "  and tolower(Image) startswith '.*3aka3'\n",
      "  | project Message, Otherfield, EventID\n",
      "  | distinct Message, Otherfield, EventID\n",
      ")\n",
      "| distinct *\n",
      "| limit 10\n",
      "| summarize any(Message), count(Otherfield) by Message\n",
      "| order by Message desc, Otherfield\n"
     ]
    }
   ],
   "source": [
    "sql=\"\"\"\n",
    "SELECT DISTINCT Message, COUNT(Otherfield)\n",
    "FROM (SELECT *\n",
    "    FROM (SELECT EventID, ParentImage, Image, Message, Otherfield FROM apt29Host)\n",
    "\n",
    "    UNION\n",
    "    SELECT DISTINCT Message, Otherfield, EventID\n",
    "    FROM (SELECT EventID, ParentImage, Image, Message, Otherfield FROM apt29Host) as A\n",
    "    INNER JOIN MyTable on MyTable.mssg = A.Message\n",
    "    WHERE Channel = \"Microsoft-Windows-Sysmon/Operational\"\n",
    "        AND EventID = 1\n",
    "        AND LOWER(ParentImage) LIKE \"%explorer.exe\"\n",
    "        AND LOWER(Image) RLIKE \".*3aka3%\"\n",
    "    LIMIT 10\n",
    "    )\n",
    "GROUP BY Message\n",
    "ORDER BY Message DESC, Otherfield\n",
    "\"\"\"\n",
    "\n",
    "kql = sql_to_kql(sql, table_map)\n",
    "print(kql)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Aliased and Calculated Select Columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SecurityEvent\n",
      "| extend ParentMessage = ParentImage + Message, Otherfield = tolower(Otherfield)\n",
      "| project ID = EventID, ParentImage, Image, Message, ParentMessage, Otherfield\n",
      "| where Channel == 'Microsoft-Windows-Sysmon/Operational'\n",
      "  and EventID == 1\n",
      "  and tolower(ParentImage) endswith 'explorer.exe'\n",
      "| extend Otherfield = count(Otherfield)\n",
      "| project mssg = Message, Otherfield\n",
      "| distinct *\n"
     ]
    }
   ],
   "source": [
    "sql=\"\"\"\n",
    "SELECT DISTINCT Message as mssg, COUNT(Otherfield)\n",
    "FROM (SELECT EventID as ID, ParentImage, Image, Message,\n",
    "    ParentImage + Message as ParentMessage,\n",
    "    LOWER(Otherfield) FROM apt29Host\n",
    "    )\n",
    "WHERE Channel = \"Microsoft-Windows-Sysmon/Operational\"\n",
    "    AND EventID = 1\n",
    "    AND LOWER(ParentImage) LIKE \"%explorer.exe\"\n",
    "\"\"\"\n",
    "kql = sql_to_kql(sql, table_map)\n",
    "print(kql)"
   ]
  }
 ],
 "metadata": {
  "history": [],
  "kernelspec": {
   "display_name": "Python 3.11.0b3 64-bit ('3.11-dev')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  },
  "uuid": "c858a45f-dae9-4c23-9081-30d4c56ee45b",
  "vscode": {
   "interpreter": {
    "hash": "63734a8610aa4e85dd677d679352ef99e2913d65a2fb9ed1a856dee792b5b5ab"
   }
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
